Skip to content

Prefix free lists#1042

Closed
Wreck-X wants to merge 2 commits into
datalab-to:masterfrom
Wreck-X:prefix-free-lists
Closed

Prefix free lists#1042
Wreck-X wants to merge 2 commits into
datalab-to:masterfrom
Wreck-X:prefix-free-lists

Conversation

@Wreck-X

@Wreck-X Wreck-X commented Jun 3, 2026

Copy link
Copy Markdown

No description provided.

Wreck-X added 2 commits June 3, 2026 14:40
- Disable group_lists/unmark_lists in StructureBuilder so surya-labeled
  ListItems stay individual instead of being merged into ListGroups or
  demoted to Text.
- Add ListItemLineExplodeProcessor: split every multi-line ListItem into
  one item per Line child (per-line bboxes).
- Add ListItemGapClusterProcessor: alternate strategy that clusters
  lines by inter-line vertical gap (default 1.5x median gap).
- Wire ListItemLineExplodeProcessor into the default pipeline; swap the
  entry in default_processors to use the gap-cluster strategy.
- scripts/patch_surya_label.py: relabel surya's <form> -> ListItem so
  Form regions also flow through this path.
- scripts/marker_view.py: HTML bbox viewer for debugging.
- PATCHES.md: documentation of all changes.
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

@Wreck-X

Wreck-X commented Jun 3, 2026

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@Wreck-X

Wreck-X commented Jun 3, 2026

Copy link
Copy Markdown
Author

recheck

@Wreck-X Wreck-X closed this Jun 3, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant